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Abstract 

A new adaptive hybrid optimization strategy, entitled squads, is pro- 
posed for complex inverse analysis of computationally intensive physical 
models. Typically, models are calibrated and model parameters are es- 
timated by minimization of the discrepancy between model simulations 
characterizing the system and existing observations requiring a substan- 
tial number of model evaluations. The new strategy is designed to be 
computationally efficient and robust in identification of the global op- 
timum (e.g. maximum or minimum value of an objective function), ft 
integrates a global Adaptive Particle Swarm Optimization (APSO) strat- 
egy with a local Levenberg-Marquardt (LM) optimization strategy using 
adaptive rules based on runtime performance. The global strategy opti- 
mizes the location of a set of solutions (particles) in the parameter space. 
The LM strategy is applied only to a subset of the particles at different 
stages of the optimization based on the adaptive rules. After the LM 
adjustment of the subset of particle positions, the updated particles are 
returned to the APSO strategy. Therefore, squads is a global strategy 
that utilizes a local optimization speedup. The advantages of coupling 
APSO and LM in the manner implemented in squads is demonstrated by 
comparisons of squads performance against Levenberg-Marquardt (LM), 
Particle Swarm Optimization (PSO), Adaptive Particle Swarm Optimiza- 
tion (APSO; the TRIBES strategy), and an existing hybrid optimization 
strategy (hPSO). All the strategies are tested on 2D, 5D and 10D Rosen- 
brock and Griewank polynomial test functions and a synthetic hydroge- 
ologic application to identify the source of a contaminant plume in an 
aquifer. Tests are performed using a series of runs with random initial 
guesses for the estimated (function/model) parameters. The performance 
of the strategies are compared based on their robustness, defined as the 
percentage of runs that identify the global optimum, and their efficiency, 



quantified by a statistical representation of the number of function eval- 
uations performed prior to identification of the global optimum. Squads 
is observed to have the best performance when both robustness and effi- 
ciency are taken into consideration than the other strategies for all test 
functions and the hydrogeologic application. 



1 Introduction 



Models are often used in the geosciences to indirectly estimate unknown (not 
observable) physical properties of a system based on observable quantities rep- 



1986 Dahlin 2001 Jessell 



resenting system behavior ( Carrera and Neuman 
|2001 | |Meek 2001 Poeter and McKenna 19951. In this process, the mathemat 



ical model is designed to simulate the system behavior f(9) for a given set of 
model parameters 6 representing the actual physical properties of the system. 
The more accurately the model matches the observations, the more representa- 
tive the model parameters are assumed to be. The process of making inferences 
about model parameters, commonly referred to as inverse modeling, regularly 
results in difficult optimization problems where a set of model parameters capa- 
ble of acceptable representation of system behavior is sought. The optimization 
process is based on a metric representing the discrepancy between the model 
simulations f(0) and the system observations. The discrepancy metric is also 
called the objective function (OF; $(0)), and is a function of model parameters 
0. In the parameter space, the metric is represented by a multi-dimensional 
response hyper-surface; a three-dimensional surface for the case of two model 
parameters. The response surface typically has a complex shape due to multiple 
minima (representing multiple plausible solutions) and flat regions (representing 
insensitivity of model parameters to the OF). An optimization process is based 
on a series of guided model evaluations for different model parameter sets. The 
challenges in the optimization process come from complications in identifying 
the global minima and from requirements to execute a substantial number of 
model evaluations. Frequently, the number of model evaluations needed for opti- 
mization can vary from about 100 to more than 10 6 depending on the complexity 
of the inverse model. As a result, the optimization process can be especially 
difficult in real- world applications using physical models where a single forward 
model simulation is performed from several minutes to more than an hour. In 



these situations, even efficient parallel techniques (e.g. Vesselinov et al. ( 2001 )) 
can cause substantial computational burden. Therefore it is important to de- 
velop computationally efficient and robust strategies that can identify the global 
minimum with a relatively small number of model evaluations. 



Optimization strategies can be classified as global and local strategies (No 
cedal and Wright 1999[ ). Global strategies excel at robust exploration of the 
response surface, identifying multiple areas of attraction; however, global strate- 
gies are inefficient at locating the parameter set producing an optimal solution 
within an area of attraction. As a result, in the case of real world model in- 
versions, the application of global strategies may be infeasible when the model 
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evaluations take a substantial amount of computational time (Keating et al. 



20101. Local strategies excel at efficiently identifying the optimal model param- 



eters within an area of attraction; however, local strategies are not designed 
for robust exploration of a response surface outside of an area of attraction. 
The local strategies are efficient within an area of attraction because they uti- 
lize local information about the gradient and curvature of the response surface. 
This requires estimation of the first and second order derivatives of the dis- 
crepancy metric in the parameter space. As many real world model problems 
have response surfaces with multiple minima, the use of local strategies alone 
is not always robust. One of the most commonly used local strategies is the 
Levenberg-Marquardt (LM) strategy which has been applied in many frequently 
used inverse analysis and parameter estimation codes in the geosciences such as 
UCODE flPoeter and Hill] [l999| > and PEST flDohertyj [2005] ) . 

Global and local strategies are complimentary; where one excels, the other 
struggles, and vice versa. The benefits of hybrid global/local strategies have 



been demonstrated previously ( 


Noel and Jannett 2004| jLeontitsis 2004 


Zhang 


et al. 


2007| Ghaffari-Miab et al. 


20071. We introduce a new development in 



hybrid optimization, coupling recent developments in Adaptive Particle Swarm 
Optimization (APSO) and a Levenberg-Marquardt (LM) strategy producing a 
novel adaptive hybrid strategy entitled squads. The strategy applies an LM 
strategy to a subset of particles at different stages of an APSO strategy based 
on adaptive rules. After the LM update of the particle position, the particle 
is passed back to the APSO strategy and continues to evolve based on APSO 
rules. In essence, squads is a global strategy utilizing local optimization speedup. 
Squads is specifically designed to be a robust and computationally efficient strat- 
egy capable of identifying the global minimum with a relatively small number of 
model evaluations in complex inverse problems. The name squads refers to the 
hierarchical structure of the population of solutions in the algorithm, similar to 
the APSO algorithm TRIBES (perc] |Jul. 2004|). 



The squads strategy is tested using the Rosenbrock (Rosenbrock 19601 and 
Griewank (Griewank 1981 1 polynomial test functions. Squads is also applied 
to solve a hydrogeological problem related to identification of the source of a 
contaminant plume in an aquifer; this problem is frequently solved in appli- 



cations related to protection and remediation of groundwater resources (Bagt- 



zoglou et al. |199l] |Snodgrass and Kitanidis 1997| [Atmadji and~T3 agtzoglou 



2001 Sun et al.l 2006 Dokou and Pinder 2009). In order to demonstrate the 



relative benefits of the hybrid strategy of squads, its performance is compared 



to open-source distributions of LM (Lourakis Jul. 2004), PSO (Particle Swarm 



Central 



20061 ), and APSO (|Clerc[ | Jul. 2004[ ) strategies. Additionally, squads is 
compared to hPSO (Leontitsis, 2004), an open-source hybrid strategy that com- 
bines PSO and the Nelder-Mead down hill simplex strategy flNelder and Mead 



1965) implemented in the MATLAB® (|The MathWorks Inc.l 120031) computing 



environment. 
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2 Particle Swarm Optimization 



Sociobiologists have theorized that individuals within a population can benefit 
from the previous knowledge and experience of other members of the popula- 



tion while searching for sporadically distributed food sources (Wilson 1975) 



The ubiquity of schooling and flocking tendencies common among many species 
suggests that this is an efficient, cost-effective strategy for the survival of in- 
dividuals. It is easy to recognize the analogy of organisms searching for food 
sources and mathematical strategies searching for optimal solutions. This recog- 
nition led to the development of PSO by |Kennedy and Eberhart" ( 1995 ) , building 
on previous research intended to graphically simulate the flocking behavior of 
birds. Certain aspects of the flocking behavior of this early research has been 
eliminated in order to improve the strategy's performance in global optimiza- 
tion, leading to the use of the term "swarm" to describe the graphical behavior 
of PSO. 

The development of PSO has produced a parsimonious optimization strat- 
egy modeling a population of randomly selected initial solutions (particles) 
by their position and velocity (Clerc 2006). In a £>-dimensional parameter 



space, the position and velocity of the ith particle can be represented as p, = 
\Pi,i,Pi,2, ■ ■ ■ ,P%,d] and v* = [ui,i, Wj,2, ■ ■ ■ respectively. An empirical for- 

mula for determining the swarm size S has been suggested as S = 10 + 2>/D 



ited, denoted as gi 



( Particle Swarm Central 2006 1 . Particles retain a record of the best location 
they have visited so far denoted as b, = [£><,!, b»,2) . . . , &i,z>]< Particles are also 
informed of the best location that K other randomly chosen particles have vis- 



Ji,2, 



, piz?]- A standard value for K is 3 (Particle 



Swarm Central 2006). These networks of informers are reinitialized after it- 
erations with no improvement to the global best particles of the swarm. The 
velocity of the ith particle in the jth dimension is updated from strategy itera- 
tion k to k + 1 as 



Vij(k+1) = wv ii j(k)+ciri(bi ! j-pij(k))+C2r2{gi,j-Pi,j(k)), k = {1,.. .,£>}, 

(1) 

where w is a constant referred to as the inertia weight, c\ and C2 are constants 
referred to as acceleration coefficients, r± and r 2 are independent uniform ran- 
dom numbers in [0,1]. The parameter w controls the level of influence between 
its previous and current particle displacement, c\ and C2 scale the random influ- 
ence of (1) the particle memory (past particle locations in the parameter space), 
and (2) the current network of particle informers (current informer locations in 
the parameter space), respectively. A limitation on the magnitude of the veloc- 
ity V m ax is commonly employed. The particle position is updated during each 
strategy iteration as 

p i j(k + l)=p itj (k)+v i , j (k+l), k = {l,...,D}. (2) 

It has been recognized that the selection of w, c\, C2, and V max tune the 
performance of PSO, modifying the balance between exploration (spreading 
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the particles throughout the parameter space) and intensification (focusing the 
particles within an area of attraction). Manual tuning of PSO's parameters can 
be a delicate task. Adaptive PSO (APSO) strategies have emerged in order to 
reduce or eliminate the often difficult and time-consuming process of parameter 



tuning of PSO ( Cooren et al. 2009) 



One of the algorithmic variants of APSO is TRIBES ( perc] |2006[ ) (TRIBES 
is not an acronym, but we follow the convention of all capital letters as proposed 
by its designer), which eliminates the tuning of the PSO strategy parameters. 
The strategy has been proven competitive with well-known strategies on a suite 



of test problems (Cooren et al. 2009). As the name suggests, TRIBES parti- 



tions the particles into groups, referred to as tribes, intended to facilitate the 
exploration of multiple areas of attraction. In this way, a hierarchical structure 
is established where the swarm is composed of a network of tribes, and each 
tribe is a network of particles. The intent is to eliminate parameter tuning 
as the swarm evolves from an initial set of tribes, and the tribes evolve from 
single particles based on rules governing the evolution of the swarm topology 
and rules for generation and elimination of entire tribes and individual particles 
within the tribes. The particle within a tribe with the lowest /highest OF for 
minimization/maximization is considered the shaman of the tribe. Information 
is shared only between the particles within a given tribe. Information between 
the tribes is shared only through the shamans. In this way, the displacements 
of non-shaman particles are influenced by the shaman of their tribe, while the 
displacements of the shamans are influenced by the best shaman in the swarm. 
The source code for TRIBES is available from iClercl (I Jul 2004 1. 



3 Squads adaptive hybrid optimization 



Various approaches have been introduced to couple the global search capabili- 



ties of PSO with the efficiency of first and second-order local strategies. Clerc 



( 1999 ) introduced a PSO strategy that adjusts particle locations based on ap- 



proximations of the gradient of the OF utilizing the OF values of the current 
particle locations. Noel and Jannett (20041 developed a hybrid PSO strategy 



incorporating gradient information directly in the calculation of particle veloc- 
ity. Leontitsis] ( 2004 ) coupled a PSO strategy with the Nelder-Mead simplex 



strategy (hPSO, |Lagarias et al. ( 1998 1 ), [Zhang et al. (20071 coupled PSO and 



back-propagation to train neural networks. Ghaffari-Miab et al. (20071 devel 



oped a hybrid strategy, iterating between PSO and BFGS quasi-Newton opti- 
mization. We present a hybrid strategy called squads that couples an APSO 
strategy with a Levenberg-Marquardt (LM) strategy. The following provides a 
detailed description of a fine-tuned coupling of APSO and LM based on adaptive 
rules, where the LM optimization is applied to improve the locations of a subset 
of selected particles (the shamans) in the course of the optimization process. 



The current APSO strategy implemented in squads is TRIBES (Clerc 2006) 



and the LM optimization is performed using the LevMar library ( Lourakis Jul 
20041. 
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Much of the time-consuming and difficult tuning required of many optimiza- 
tion strategies is reduced in squads utilizing adaptive rules. The APSO strategy 
does not require the specification of optimization parameters ( Clerc 2006 1 , and 
the applied LM strategy is optimized to work well on many problems using de- 



fault and internally estimated optimization parameters (Lourakis Jul. 2004) 



The adaptive rules implemented in squads to control the performance of LM 
speedups during the APSO optimization are also designed to be general and 
capable to tackle problems with different complexity. 

A flow diagram of the squads strategy is presented in Figure [3j Tables [T] 
and [2] describe the particle initialization and displacement rules and their selec- 
tion within squads. For consistency with other global strategies discussed here, 
squads is initialized with N t = S = 10 + 1\fD mono-particle tribes, where N t 
is the number of tribes in the swarm and S is the number of particles. How- 
ever, squads can also be initiated with a single mono-particle tribe and allow 
the swarm to develop based on the built-in adaptive rules. If provided, one 
of the initial particles is set to predefined values (rule 1 in Table [lj , while the 
remaining positions of the initial particles are determined according to rule 5 in 
Tabled] 

Each iteration of the strategy is initiated by determining the informers for all 
the particles. For non-shaman particles, this will be the shaman of their tribe. 
A shaman is the particle with the best (e.g. lowest for minimization) OF value 
within the tribe. For shaman's, this will be the shaman with the best OF value 
within the swarm, referred to as the best shaman. Particle positions are then 
updated according to the rules described in Table [2] Particles are initialized to 
use displacement rule 1. After informers are determined, particle positions are 
updated based on their currently selected displacement rule. 

The decision to adapt a tribe is based on whether the tribe has demonstrated 
sufficient improvement during the previous strategy iteration. This is performed 
stochastically, by comparing the fraction of particles in the tribe that improved 
their location in the last move with a random number between and 1. If the 
fraction is greater than the random number, the tribe is considered a good tribe, 
and the worst particle is removed from the tribe. This eliminates unnecessary 
model evaluations, focusing the attention of the tribe on the good particles. 
Otherwise, the tribe is considered a bad tribe, and a particle is added to the 
tribe (refer to Table [I] for details on particle initialization rule selection) and a 
randomly selected dimension of a randomly selected particle in the tribe (other 
than the shaman) is reinitialized randomly. Adding a particle to a bad tribe is 
intended to increase the exploration of the parameter space by the tribe. 

The swarm adaptation occurs either every Nt * (N t — l)/4 strategy iterations 
or if the swarm is labeled by LM as a bad swarm. A swarm is considered a bad 
swarm if LM speedup was performed in the previous iteration, and the OF was 
not reduced by at least 2/3 for all the LM updated shamans. A mono-particle 
tribe is added to the swarm if it is considered bad according to rule 5 in Table [T] 
The tribe led by a shaman with the worst OF in the swarm is removed if the 
swarm is considered good. 

Next, particle displacement rule selections are updated. Particle displace- 
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Select informers 



Update position!! 



T 




Adapt swarm 



I 



Update strategies 




Update shamans 
with LM 




Shaman local 
random search 




Figure 1: Flow diagram of squads strategy. E is the current number of model 
evaluations and E max is the allowable number of model evaluations. Decisions 
to "adapt swarm" or "update shamans with LM" are determined by adaptive 
rules. 
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Table 1: Particle initialization rules and their selection criteria. 

Particle initialization rules: 

1. User specified 

2. Randomly chosen position within parameter space: 

Pnewj = U {Pminj > Pmaxj) i j 1, . . . 

3. Randomly chosen within hyperparallelepid surrounding the best position 
of the swarm with dimensions (2 • rj) determined by Euclidean distance 
between the swarm's and tribe's best position: 

Tj = \Pbestj - Ptribe best, I j = 1, . . . , D 

Pne Wj =U(p 6estj . - r j ,p bestj + rj)j = 1, . . . , D 

4. On one of the vertices of the parameter space with equal probability of 
being the max or min of each dimension: 

if (U(0, 1) < 0.5) then p neWj = p min . , else p neWj = p maXj j = 1, . . . , D 

5. Randomly chosen within the largest empty hyperparallelepid of the pa- 
rameter space 



Criteria 


Initialization rule selection 


First particle of the strategy 


1 


If initial population is greater 
than 1, other initial particles 


5 


Particle added to "bad" tribe (tribe adaptation) 


randomly chosen between 2 and 5 


Mono-particle tribe added (swarm adaptation) 


5 


LM unable to reduce OF of shaman by 2/3 


5 
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Table 2: Particle displacement rules and their selection criteria based on the 
status of the particle. N(/x, a) is a normal distribution with a mean fi and 
standard deviation a, U(a, b) is a uniform distribution with minimum a and 
maximum 6, /(— ) is the value of the objective function, g = [gi, g2, . . . , <?d] 
is the location of the particles designated informer, b = [61, 62, . . . , bo] the 
particle's current best location, and mirij and maxj are the minimum and 
maximum values for the jth dimension, respectively. 

Particle displacement rules: 

1. pj —\J(minj,maxj) j = 1, . . . , D, change displacement rule to 2 for next 



time 




+ \bj-gj\) + 



bj)) 



/(g)+/(b) 



/(b) 



• UG& - \bj 



Particle status 



Displacement rule selection 



("=) 
(==) 
(+=) or (++) 



randomly choose any rule other than current one 
randomly choose between rule 2 and 3 
change to rule 1 with 50% probability 
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ment rule selections are modified based on whether or not (1) their position has 
improved in the last move and (2) their best overall position has improved in the 
last move. Following the convention of Clerc ( |2006 ), we use a (+) to indicate 



improvement, (=) the same OF value, and (-) a worse position. The particles 
performance can then be denoted as one of the following: (-=), (==), (+=), 
and (++), where the first symbol indicates if the particle improved its position 
in the last move, and second symbol indicates if the overall best position of the 
particle improved in the last move. Note that the best overall performance can 
only stay the same or improve, and an improvement in the overall performance 
indicates an improvement over the last position. Table [2] lists the displacement 
rule selection based on particle performance. 

After the swarm adaptation, squads checks whether or not to update the 
shamans using LM. LM updating is turned off in squads if none of the shamans 
reduces the OF of the previous shamans by more than 2/3 during the last LM 
updating. LM updating will be restarted when the best OF of the previously ob- 
tained OF during LM has been reduced by an order of magnitude by the APSO 
strategy. This postpones LM until the global APSO strategy has identified a 
position with a significant improvement, which will perhaps be a previously 
unidentified area of attraction. After the LM optimization, the new shaman 
location is used in the APSO strategy. 

In contrast with the APSO strategy, the LM strategy requires that the OF 
be represented as a summation of components at least equal to the number of 
parameters as 



N 



$(0) = ]>>(<?), (3) 

i=l 

where 6 is a vector of model parameters and N is equal or larger than the number 
of model parameters. This allows the LM strategy to estimate the local gradient 
and curvature of the response surface in the parameter space. These calculations 
utilize numerical derivatives of the OF components in equation[3]with respect to 
the model parameters (also called the Jacobian matrix) . Based on the Jacobian 
matrix, the LM strategy also estimates the second-order derivatives of the OF 
components with respect to model parameters (also called a Hessian matrix). 
The second-order derivatives approximate the local curvature of the response 
surface. The LM strategy searches for the local optimum by adaptive adjust- 



ment between first and second-order optimization techniques (Levenberg 1944 



Marquardt 19631) . Frequently in the case of model inversion problems, the OF 
in equation |3j is represented by the discrepancy between model simulated values 
fi(9) and corresponding observations Oi, where i = 1,...,N, and N is now the 
number of observations. For example, frequently $(0) is computed as 

N N 

$(0) = £$ 4 (0)=£(/,:(0)- Ol ) 2 . (4) 

i=i i=i 

Squads estimates the first-order derivatives using a finite difference approach 
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applied in the LevMar library (Lourakis Jul. 2004). 



The following criteria are defined by default in LevMar to terminate the LM 



optimization (Lourakis Jul. 2004), and applied in the LM updating of squads 
as well: (1) the maximum change in any parameter is less than 10 -5 ; (2) the 
relative change in the L2 norm of the change in the parameter values is less than 
10~ 5 of the L2 norm of the parameter values; (3) the OF reaches a value of zero; 
(4) the Jacobian matrix is close to singular, and (5) the maximum number of 
LM iterations (i.e. derivative approximation and Marquardt parameter value 
exploration) is achieved (50 when standalone LM is performed; 8 in squads). 
The criteria are designed to terminate LM once it successfully identifies a local 
optimum. Typically, criteria 1, 2, and 5 terminate the LM updating in squads 
(the termination criteria of the LM updating within squads do not terminate the 
squads run). Squads is terminated when either one of the following conditions 
are met: (1) E max , the number of allowable model evaluations, is exceeded or 
(2) the OF reaches below a predefined cutoff value. 

The final step of each iteration is to perform a random local search in the 



empty space around each shaman (Clerc Jul. 2004). In this step, a random 



position within the largest hyperparallelepid centered on the tribe's shaman, 
void of other particles, is evaluated. If the position is an improvement over the 
current shaman position, the shaman is moved to this location. Otherwise, the 
position is forgotten. 

Global strategies in general, including APSO, are designed to operate on 
a bounded parameter space. The parameter ranges are typically predefined 
depending on the physical constraints or prior knowledge about the parameter 
distributions. However, the LM optimization by default works in an unbounded 
parameter space. There are various techniques to constrain an LM strategy 
within a parameter space, but these techniques typically have a negative impact 
on LM performance. To avoid this, squads operates in a transformed parameter 
space. For example, an element of the parameter vector 6 is transformed as 



9 max 



1 , (5) 



where 9 is the transformed parameter, and 6 max and 9 m in are the upper and 
lower bounds for parameter 9, respectively. The APSO strategy is performed 
in the transformed parameter space bounded within [— 7r/2;7r/2] in all dimen- 
sions, while the LM updating is performed unconstrained in the transformed pa- 
rameter space. Model (function) evaluations are performed on de-transformed 
parameters by 



sin(0) + 1 



0- (6) 



In this way, the LM updating is unaware of parameter boundaries and is unaf- 
fected by performance issues associated with calculating numerical derivatives 
near boundaries. It should be noted that in the process of the LM updating, 
the transformed parameters can be moved outside of the [— 7r/2;7r/2] range; 
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however, the transformed parameters are returned to equivalent values within 
[— 7r/2;7r/2] before being passed back to the APSO strategy by 

Qapso = arcsin(sin(0 LM )). (7) 

where Olm represents the unconstrained transformed parameters resulting from 
LM updating and Qapso represents the constrained transformed parameters 
passed back to the APSO strategy, thereby ensuring that APSO receives pa- 
rameters within its explicitly defined, bounded parameter space. It is important 
to note that Oapso an d Olm are equivalent in the non-transformed parameter 
space. 



4 Test functions 

The squads strategy is tested by optimizing the Rosenbrock and Griewank test 
functions. The Rosenbrock and Griewank functions present difficult optimiza- 
tion problems exhibiting frequently observed complexities in response surface 



Rosenbrock 


(1960 


); 


Griewank 


(1981 


); 


Clerc 



p006| ; [Cbbren et al.| p009| ). 

The response function defined by the Rosenbrock function is comprised of 
a large valley with an ill-defined, shallow global minimum. For D < 3, the 
function is unimodal with a global minimum at x = 1 (where 1 = [1, . . . , 1]). 
For 4 < D < 7, a local minimum exists at (x\, X2, ■ ■ ■ , xd) = (—1, 1, . . . , 1) in 
addition to the global minimum, while for D > 7, multiple suboptimal local 



minima exist (Shang and Qiu 2006). In the case of two model parameters, the 
shape of the Rosenbrock function is presented in Figure [2j The Rosenbrock 
function generalized to any number of dimensions greater than or equal to two 
can be expressed as 



D-l 



* r (x 1} ...,x D ) = ^(1 - x>) 2 + 100(x l+1 -xf) 2 



(8) 



i=i 



The estimation of the local gradient and curvature of the response surface by 
LM requires the test function to be represented as a summation of parts as in 
equation [3] The summation components of <fr r (xi, . . . , xd) can be expressed as 



$r,2i-lOi) = (1 -Xif i < D 



(9) 



and 



<5>rM x ii x i+i) = 100(x l+ i - x\f i < D (10) 

producing 2(D — 1) OF components where equation [9] and 10 define the odd 
and even numbered components, respectively; therefore, the number of compo- 
nents (called also "observations" in the case inverse problems) in the 2D, 5D, 
and 10D cases are 2, 8, and 18, respectively. The LM strategy uses the deriva- 
tives of <& rt i(xi, . . . , xd) with respect to model parameters to evaluate the local 
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Griewank Rosenbrock 




Figure 2: Rosenbrock and Griewank polynomial test functions with global min- 
ima at (1,1) and (0,0), respectively. Note the different parameter ranges on the 
top and bottom rows. The top row shows the parameter space explored by the 
optimization strategies. The bottom row focuses on the parameter space near 
the respective global minima. 



13 



gradient and curvature of the response surface. In most real world problems, 
the analytical computation of derivatives is not feasible. Therefore, in all the 
examples presented below, the derivatives are computed numerically using a 
finite difference approach, even though the analytical derivation in this case is 
trivial. Other alternative representations of $ r as a sum of components are also 
possible. 

The D-dimensional Griewank function is defined as 

%ixi ,..., XD) ^i + ^^-f[ cos f^Y (ii) 

i— 1 i — 1 ^ V ' 

The Griewank function has numerous local areas of attraction, but a single 
global minimum of zero at x = 0. In the two-dimensional case, the function 
has the shape of an "egg carton" that is depressed in the center, as depicted in 
Figure [2]). 

The summation components can be defined as 

• D '-5 + ^5-5n~(^) ( 12 > 

Therefore, the number of components ("observations") equals the number 
of model parameters. 

The multidimensional Griewank function is important for testing of hybrid 
optimization strategies because it becomes more difficult to minimize for global 



strategies as its dimensionality increases (Locatelli 2003). However, although 
counterintuitive, the Griewank function becomes easier to minimize for local 
strategies as the dimensionality increases. Therefore, with the increase in di- 
mensionality, it is expected that LM performance will improve while the PSO, 
TRIBES and hPSO performance will decrease. For different parameter-space 
dimensionality, the performance of hybrid strategies will depend on how effi- 
ciently they adaptively balance between the local and global strategies. At low 
dimensionality (D = 2), the hybrid strategies should benefit from the global 
strategy; at high dimensionality, the hybrid strategies should benefit from the 
local strategy. 



5 Contaminant source identification test case 

Optimization strategies are commonly employed to calibrate physics-based mod- 
els to available observations. We demonstrate the optimization strategies on a 
hydrogeologic application to identify the center (x s ,y s ) and dimensions (xd,yd) 
of a parallelepiped contaminant source in an aquifer using observations of con- 
taminant concentrations from monitoring wells near the expected source loca- 
tion. The synthetic groundwater flow and transport problem is three-dimensional 
and semi-infinite, the top model boundary aligns with the top of the aquifer, 
and the model extends to infinity laterally and with depth. The locations of 
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Figure 3: Map of monitoring well locations. The "true" source is shown as a 
solid rectangle. The search domain for x s and y s is shown as a dotted rectangle. 
The contaminant concentration plume at t = 49 years is represented by the 
color map. 



the monitoring wells, depths below the aquifer top boundary of the top and 
bottom of the screens, times of observation after the contaminant release, and 
observed contaminant concentrations are presented in Table [5j The parameter 
values used to generate the "true" concentrations at the monitoring wells and 
the minimum and maximum parameter values allowed in optimization runs are 
presented in Table [5} The "true" location of the source, the location of monitor- 
ing wells, and contaminant concentrations at t — 49 years since the contaminant 
was released are presented in Figure [3j A similar model is presented in [Harp 
and Vesselinov (2011) with additional details. 



The objective function for the contaminant transport test case is a sum-of- 
the-squared-residuals (SSR) expressed as 



N 



(13) 



where 6,(0) is the simulated concentration resulting from 0, Ci is the i^ 1 
observed concentration, and TV is the number of observations. In summary, 
there are 4 unknown model parameters constrained by 20 observations. 

The simulated contaminant concentrations (c) are produced from an analyt- 
ical contaminant transport model encoded in MADS (Wexler 1992 Wang and 
Wu] |2009| |Vesselinov[ |2010[ ) (refer to |Harp and Vesselinov| ( |2011[ ) for additional 
simulation details). Due to the rounding of the observed concentrations, a value 
of $=0.55 is obtained from the true parameter values. 

The response surface resulting from equation [13] plotted as a function of x s 
and y s is presented in Figure [4] The values plotted in Figure [4] are the lowest 
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Table 3: Well coordinates, screen top (z top ) and bottom (zfc ot ) depths below the 
water table, and year and value of observed contaminant concentrations. 





x a [m] 


Vs [m] 


Xd[m] 


Vd[m] 


'true' 
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1393 
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273 


min 


210 
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1 


1 


max 


1460 
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500 


500 



Table 4: 'True', minimum, and maximum parameter values for the contaminant 
transport test case. 
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Figure 4: Contaminant source identification response surface for contaminant 
source locations denned by x s and y s . The minimum OF value for each combi- 
nation of x s and y s are plotted considering allowable ranges for source lateral 
dimensions, Xd and yd- 



values for $ at each combination of x s and y s considering allowable ranges for 
Xd and yd- In other words, this is the response surface that the strategies would 
traverse if they knew the optimal values for Xd and yd for each combination of 
x a and y s . Note that the actual 4D response surface is more complicated than 
this 2D representation. Features from both the Griewank and Rosenbrock test 
functions can be seen in this representation of the response surface with multiple 
areas of attraction (three suboptimal minima and one global minimum), regions 
of parameter insensitivity (flat regions), and narrow, curved (banana-shaped) 
valleys. 

Even though the contaminant transport model is analytical, the computa- 
tional time is substantially higher than that for the test functions. Within 
MADS, the number of function evaluation per second is 40,000 for the test 
functions compared to 400 for the contaminant transport model. For hPSO, 
the number of function evaluations per second is 11,000 for the test functions 
compared to 2 for the contaminant transport model; the substantial increase in 
hPSO computation time between the test functions and the hydrogeologic appli- 
cation is due to external coupling between the Matlab computing environment 
(applied to execute hPSO) and external C based transport simulator. 



6 Results and discussion 



The performance of squads on the Rosenbrock and Griewank functions is com- 
pared with (1) LM, (2) PSO, (3) TRIBES, and (4) hPSO. The LM strategy 



is an implementation of LevMar ( Lourakis Jul. 2004 ) (the same strategy as 



applied in squads), PSO is an implementation of Standard PSO 2006 (Particle 
|Swarm Central 2006), TRIBES is an APSO strategy implemented in the code 
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described in Clerc (2006), hPSO is freely available hybrid optimization code 



from Leontitsis (2004). LM, PSO, TRIBES and squads are built into the code 
MADS (Vesselinov 2010), which is utilized for all analyses except hPSO. The 



hPSO analysis is performed using MATLAB version 7.8.0.347 (R2009a) (The 



MathWorks Inc. , 2003). The strategy parameters (i.e. optimization parameters) 



for PSO and hPSO are set to values that have been demonstrated to perform 



well in many test cases (Particle Swarm Central, 2006) as w = 0.72, c\ = 1.2 
and c-2 = 1.2 (refer to equation [lj) . 

The strategies are tested on both functions by performing 1,000 independent 
optimizations runs with random initial guesses distributed in the searchable pa- 
rameter space bounded by [-100:100] for all dimensions. In the case of LM, the 
searchable parameter space is not bounded. This did not influence its perfor- 
mance as the OFs of both functions have generally increasing trends towards 
the boundaries (Figure [2]). Optimization success is defined as identifying a so- 
lution with all parameters values within 0.1 of the global minimum parameter 
values (x = 1 for the Rosenbrock function and x = for the Griewank func- 
tion). The maximum number of function (model) evaluations (E max ) for the 
strategies is set to 20,000. However, in performed analyses, LM runs terminate 
at fewer function evaluations as the convergence criteria of LM are designed to 
terminate its run once it identifies a minimum in the response surface. The 
ability of LM to identify the global minimum depends on whether the minimum 
encountered by LM is local or global. 

Figures [5] and [6] present boxplots for the number of function evaluations for 
successful runs for 2D, 5D, and 10D Rosenbrock and Griewank functions, re- 
spectively. In the figures, the boxes represent the 25^ n to 75^ n percentile ranges, 
the bars inside of the boxes represent the median values, and the whiskers rep- 
resent the minimum and maximum values. The fraction of successful runs out 
of the attempted runs are presented above the boxes. Note that the statistical 
definitions of the boxplots are not accurate for the cases where the number of 
successful runs does not present a statistically significant sample. The robust- 
ness of the strategies is defined as the percentage of successful runs (i.e. fraction 
of successful runs * 100). The efficiency of the strategies is summarized by the 
statistics presented in the boxplots. 

For the Rosenbrock function (Figure [5|, the robustness of LM decreases 
from the 2D case to the 10D case from 36% to 0%. The robustness of PSO 
and TRIBES is comparable in the 2D case, albeit with TRIBES exhibiting 
higher efficiency in general. In the 5D case, PSO has a higher robustness than 
TRIBES, however, at lower efficiency. hPSO achieves 100% robustness in the 
2D and 5D cases, with a significant decrease in efficiency from the 2D to 5D 
case. The robustness of hPSO decreases significantly in the 10D case with only 
a single success out of 1000 (0.1%). In the 10D case, LM, PSO, TRIBES, 
and hPSO exhibit low robustness. Squads is 100% robust in all cases. The 
efficiency is observed to decrease from the 2D case to the 10D case for squads; 
however, the efficiency of squads is greater than PSO, TRIBES, and hPSO in 
all cases. The efficiency of squads and LM are similar for the 2D and 5D cases 
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Figure 5: Boxplots of number of function evaluations to reach the global min- 
imum for the 2D, 5D, and 10D Rosenbrock function. The boxes represent the 
25^ to 75*^ percentile ranges, the bars inside of the boxes represent the me- 
dian values, and the whiskers representee min and max values. Note that the 
statistical definitions are not accurate when the number of successful runs does 
not present a statistically significant sample. The fraction of successful runs out 
of 1000 for each strategy is stated above the boxes. The maximum allowable 
function evaluations for each run is 20,000. 
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Figure 6: Boxplots of number of function evaluations to reach the global min- 
imum for the 2D, 5D, 10D Griewank function. The boxes represent the 2h^ 11 
to 75*k percentile ranges, the bars inside of the boxes represent the median 
values, and the whiskers represent the2&in and max values. Note that the sta- 
tistical definitions are not accurate when the number of successful runs does 
not present a statistically significant sample. The fraction of successful runs out 
of 1000 for each strategy is stated above the boxes. The maximum allowable 
function evaluations for each run is 20,000. 



(Figure [5]). However, in these two cases, the robustness of squads is 100% which 
is considerably better than the robustness of LM (36% for 2D and 4% for 5D). 
In the 10D case, LM did not produce a single successful run while squads is still 

100% robust. 

For the Griewank function (Figure [6]), as expected (see Locatelli (2003)), the 
robustness of LM increases as the dimensionality of the problem increases. In the 



2D case, which is the most difficult for a local gradient-based strategy (Locatelli 



2003 1, the robustness is only 3%. Since LM is a local strategy, it is not surprising 



that LM frequently converges at non-optimal minima. As expected for the 2D 
case, the global strategies (PSO, TRIBES, hPSO and squads) are substantially 
more robust than LM. The robustness of PSO and TRIBES (both purely global 
strategies) decrease significantly from the 2D to the 5D case, while decreasing 
only slightly from the 5D to the 10D case (the efficiency of PSO decreases also). 
hPSO is 100% robust for the 2D case; however, is unable to locate the global 
minimum in the 5D and 10D cases. Squads is 100% robust in the 2D and 10D 
cases and 80% robust in the 5D case. 

As already discussed, the multidimensional Griewank function is important 
for testing of hybrid strategies such as squads. For different parameter-space 
dimensionality, the performance of squads is influenced by the ability of the 
adaptive rules in the optimization algorithm to balance between the local (LM) 
and global (APSO) strategies. With the increase of dimensionality, the local 
gradient-based (LM) strategy becomes more robust, while the global (APSO) 
strategy becomes less robust. At D = 2, squads is both more robust and 
efficient than the other global methods (Figure [6]). At D — 10, squads benefits 
from the local gradient-based search strategy which performs better at higher 
dimensions. The 5D Griewank function is observed to be the most difficult 
test problem for squads as both the local and global strategies struggle in this 
dimensionality of the Griewank function. Nevertheless, for the 5D case, squads 
produces the highest robustness (80%) and efficiency (excluding LM) of all the 
tested strategies; squads is 100% robust if the maximum number of function 
evaluations is increased to 70,000 (results are not shown here). In summary for 
the Griewank cases, squads is observed to have the best performance when both 
robustness and efficiency are taken into consideration than the other strategies 
(Figure [6]) . 

The performance of the strategies is demonstrated on the hydrogeologic ap- 
plication in contaminant source identification presented in Section [5] A boxplot 
of the necessary function evaluations for successful runs is presented in Figure [7j 
As with the test functions, 1000 runs are performed for each strategy, except 
for hPSO, where only 100 runs are performed due to the computational expense 
of evaluating the contaminant transport model from hPSO (2.2 function evalu- 
ations per second). The maximum number of function evaluations allowed for 
each optimization run is limited to 5,000. An optimization run is considered 
successful if the objective function is reduced below a value of 1. This ensures 
that the solution has reached the area of attraction around the global minimum 
as the suboptimal minima of the response surface are all greater than 1. As 
with the test functions, it is observed that LM is efficient on the hydrogeologic 
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Figure 7: Boxplots of number of function evaluations to reach the global mini- 
mum for the hydrogeologic application presented in Section [5] The boxes rep- 
resent the 25 to 75 percentile ranges, the bars inside of the boxes represent 
the median values, and the whiskers represent the min and max values. Note 
that the statistical definitions are not accurate when the number of successful 
runs does not present a statistically significant sample. The fraction of success- 
ful runs (out of 1000 for LM, PSO, TRIBES, and squads; out of 100 for hPSO) 
is stated above the box. The maximum allowable function evaluations for each 
run is 5000. 



application, but only approximately 26% robust. PSO and TRIBES are ob- 
served to be inefficient and not robust in this case requiring high numbers of 
function evaluations with approximately 3% and 0.7% robustness, respectively. 
hPSO demonstrates some robustness at 69%, but with low efficiency in gen- 
eral with a large variability in the necessary number of function evaluations. 
squads demonstrates high robustness at 100% with higher efficiency than PSO, 
TRIBES, and hPSO. While the efficiency of LM is better than squads in this 
case, this is with a significantly lower robustness. 

It is important to emphasize in all test cases, squads can converge at a rela- 
tively low number of model evaluations when compared to PSO, TRIBES and 
hPSO. This is manifested by the minimum values of the boxplots in Figures [5j 
[6j and [7] Furthermore, the statistical distributions of the number of model eval- 
uations required to achieve the global minimum for squads are skewed to the 
left in all cases (Figures [5| [6j and [7]) . This demonstrates that more frequently 
squads may converge with lower number of functional evaluations. 

7 Conclusions 

A new adaptive global hybrid optimization strategy called squads is developed 
for solving computationally intensive inverse problems involving models repre- 
senting the behavior of complex systems. Squads utilizes a (1) global strategy 
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for robust exploration of the parameter space to identify multiple areas of at- 
traction and (2) a local gradient-based search strategy to efficiently locate the 
optimum of areas of attraction. In essence, squads is a global strategy that 
implements a local gradient-based optimization speedup. Squads is sufficiently 
robust in avoiding becoming stuck in local minima during the optimization as 
typically observed in the case of local gradient-based strategies. The new strat- 
egy reduces the number of model runs typically required of other frequently 
used global strategies such as Particle Swarm Optimization (PSO) by efficiently 
exploring local areas of attraction. 

The strategy is tested on 2D, 5D, and 10D variations of two commonly used 
polynomial test functions: the Rosenbrock and Griewank functions. The strat- 
egy is also demonstrated on a synthetic hydrogeologic application to identify 
the source center and source dimensions of a contaminant plume in an aquifer 
based on observed contaminant concentrations at monitoring wells. The robust- 
ness of a strategy is defined as the percentage of runs that identify the global 
minimum in each test case. The efficiency of a strategy is evaluated through a 
statistical representation of the number of function or model evaluations neces- 
sary to identify the global optimum. In all cases, squads is as robust or more 
robust than the other tested strategies: LM, PSO, TRIBES, and hPSO. Squads 
is more efficient than PSO, TRIBES, and hPSO in all cases. For the Rosenbrock 
function, squads has comparable efficiency to LM, however, in these cases, the 
robustness of squads (100%) is considerably better than the robustness of LM 
(less than 36%). For other optimization problems, squads may converge for 
the same number of functional evaluations as LM (Figure [6| . For the Griewank 
function and hydrogeologic application, LM successfully converges to local areas 
of attraction; however, these were not always the global minimum. For the 2D 
Rosenbrock function, LM converges within an area of attraction shaped as a 
narrow curved valley where the gradient is so small that it appears to LM that 
it has identified a minima (the valley contains the global minimum). 

The application of the squads strategy is performed using the code MADS 
(Vesselinov 2010). MADS and other files needed to execute the synthetic prob- 
lems presented in this paper are available at |http: / /www.ees.lanl.gov/staff/monty/codes/mads.html 
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